Privacy Risks from Genomic Data-Sharing Beacons.

نویسندگان

  • Suyash S Shringarpure
  • Carlos D Bustamante
چکیده

The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries--such as "Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?"--with either "yes" or "no." Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A utility maximizing and privacy preserving approach for protecting kinship in genomic databases

Motivation Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship. Results We define t...

متن کامل

Balancing the risks and benefits of genomic data sharing: genome research participants' perspectives.

BACKGROUND Technological advancements are rapidly propelling the field of genome research forward, while lawmakers attempt to keep apace with the risks these advances bear. Balancing normative concerns of maximizing data utility and protecting human subjects, whose privacy is at risk due to the identifiability of DNA data, are central to policy decisions. Research on genome research participant...

متن کامل

How Can Photo Sharing Inspire Sharing Genomes?

People usually are aware of the privacy risks of publishing photos online, but these risks are less evident when sharing human genomes. Modern photos and sequenced genomes are both digital representations of real lives. They contain private information that may compromise people’s privacy, and still, their highest value is most of times achieved only when sharing them with others. In this work,...

متن کامل

Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks

The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context-a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or "beacon") is responsible for assuring that genomic data are exposed through the Beacon s...

متن کامل

Challenges of web-based personal genomic data sharing

In order to study the relationship between genes and diseases, the increasing availability and sharing of phenotypic and genotypic data have been promoted as an imperative within the scientific community. In parallel with data sharing practices by clinicians and researchers, recent initiatives have been observed in which individuals are sharing personal genomic data. The involvement of individu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • American journal of human genetics

دوره 97 5  شماره 

صفحات  -

تاریخ انتشار 2015